Gibrat's law for cities: 
uniformly most powerful unbiased test 
of the Pareto against the lognormal 

Y. Malevergne 1,2 ' 4 , V. Pisarenko 3 and D. Sornette 4,5 
1 Universite de Lyon - Universite de Saint-Etienne - Coactis E.A. 4161, France 
2 EMLYON Business School - Cefra, France 

3 International Institute of Earthquake Prediction Theory and Mathematical Geophysics 

Russian Academy of Science, Moscow, Russia 

4 ETH Zurich - Department of Management, Technology and Economics, Switzerland 

5 Swiss Finance Institute, Switzerland 
e-mails: malevergne@em-lyon.com, pisarenko@yasenevo.ru and dsornette@ethz.ch 



Abstract 

We provide definitive results to close the debate between Eeckhout (2004, 2009) and Levy (2009) 
on the validity of Zipf 's law, which is the special Pareto law with tail exponent 1, to describe the tail of 
the distribution of U.S. city sizes. Because the origin of the disagreement between Eeckhout and Levy 
stems from the limited power of their tests, we perform the uniformly most powerful unbiased test for 
the null hypothesis of the Pareto distribution against the lognormal. The p-value and Hill's estimator 
as a function of city size lower threshold confirm indubitably that the size distribution of the 1000 
largest cities or so, which include more than half of the total U.S. population, is Pareto, but we rule out 
that the tail exponent, estimated to be 1.4 ± 0.1, is equal to 1. For larger ranks, the p-value becomes 
very small and Hill's estimator decays systematically with decreasing ranks, qualifying the lognormal 
distribution as the better model for the set of smaller cities. These two results reconcile the opposite 
views of Eeckhout (2004) and Levy (2009). We explain how Gibrat's law of proportional growth 
underpins both the Pareto and lognormal distributions and stress the key ingredient at the origin of 
their difference in standard stochastic growth models of cities (Gabaix 1999, Eeckhout 2004). 



JEL classification: D30, D51, J61, R12. 
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Based upon the U.S. Census 2000 data, Eeckhout (2004) reports that the whole size distribution of 
cities is lognormal rather than Pareto. This conclusion is obtained by using the Lilliefors test (L-test) 
(Lilliefors 1967, Stephens 1974) for normal distributions with empirical mean fi = 7.28 and standard 
deviation a = 1.25. This empirical conclusion is consistent with Gibrat's law of proportionate effect 
and is rationalized by an equilibrium theory of local externalities in which the driving force is a random 
productivity process of local economies and the perfect mobility of workers. 

Levy (2009) argues that the top 0.6% of the largest cities of the U.S. Census 2000 data sample, which 
accounts for more than 23% of the population, dramatically departs from the lognormal distribution and 
is more in agreement with a power law (Pareto) distribution. The bulk of the distribution actually follows 
a lognormal but, due to the departure in the upper tail, a x 2 -test unequivocally rejects the null of a 
lognormal for cities whose log-size is larger than fi + 3a = 12.53. The non-rejection of the lognormal 
by the L-test used by Eeckhout (2004) is ascribed to the fact that the relative number of cities in the upper 
tail is very small (only 0.6% of the sample), and the L-test is dominated by the center of the distribution 
rather than by its tail, where the interesting action occurs. 

In reply, Eeckhout (2009) provides the 95% -confidence bands of the lognormal estimates based upon 
the L-test and shows that the tail of the sample distribution of log-size is well within the confidence bands. 
In addition, Eeckhout asserts that "both [Pareto and lognormal] distributions are regularly varying, i.e. 
they are heavy tailed, and their tails have similar properties. [...] It is natural that the upper tail of city 
sizes can be fit to a Pareto distribution". Therefore "[g]iven that the tail of a lognormal is indistinguish- 
able from the Pareto under certain circumstances, the researcher who is interested in the tail properties 
of a size distribution can choose which one to use." 

In the first part of this comment, we summarize the properties that make often difficult the task of 
distinguishing between the Pareto and the lognormal distributions. While the Pareto and the lognormal 
distributions have indeed distinct asymptotic tails - in contrast with the Pareto, the lognormal is not 
regularly varying but rapidly varying - the lognormal can easily be mistaken for a Pareto over a range 
which can cover several decades as soon as its standard deviation is sufficiently large (a few units is 
sufficient). Furthermore, both distributions may be generated by Gibrat's law of proportional growth, 
with some additional apparently innocuous but actually profound twist(s) for the Pareto. In a second 
part, using exactly the same data set, we find that the origin of the disagreement between Eeckhout and 
Levy stems from the limited power of their tests. Using the uniformly most powerful unbiased test for 
the null hypothesis of a Pareto distribution against the lognormal, we confirm and extend Levy's result, 
by showing that the Pareto model holds for the 1000 largest cities or so, i.e. for more than 50% of the 
total population. Zipf 's law, corresponding to Pareto with exponent 1, is found incompatible with the 
data at the 90% confidence level. The Pareto index for the uppermost tail (about 1000 largest cities) is 
approximately 1.4. 

1 Why the Pareto and the Lognormal distributions are difficult to distin- 
guish 

1.1 Structural similarities and differences 

In order to justify that Levy's results are compatible with his own, Eeckhout (2009) asserts that both the 
Pareto distribution and the lognormal distribution are regularly varying, which makes their tail indistin- 
guishable. We recall that a positive function f(x) is regularly varying at infinity if there exists a finite 
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real number a such that (Bingham et al. 1987) 
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Pareto distributions are regularly varying. However, it is not the case for lognormal distributions. Indeed, 
the lognormal density reads 
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This limit behavior characterizes a rapidly decreasing function at infinity. Therefore, Pareto and lognor- 
mal distributions exhibit qualitatively different behaviors in their upper tails. The lognormal density goes 
to zero, in the upper tail, faster than any Pareto density. In this respect, they cannot be mistaken into one 
another, provided that one has enough data to sample the tail. 

However, writing the lognormal density as follows 

1 1 llni-ri 2 1 M 2 II M Ins 

f(x) = —= e 572— = _^ e -^i . x - 1+ ^ ^ (4) 

we observe that the lognormal distribution is superficially like a Pareto distribution with a slowly increas- 
ing effective exponent 
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Expression © allows us to make two points. First, as stated above, it shows that the lognormal distribu- 
tion decays at infinity faster than any Pareto distribution, since the apparent exponent a(x) diverges with 
x. Second, if a 2 is large enough, the apparent exponent a(x) varies so slowly so as to give the impression 
of constancy over several decades in x. Quantitatively, in the range X < x < XX, the apparent exponent 
varies from a(X) to a(X) + In A. For instance, for a = 3.4, the apparent exponent varies by no 
more than 0.3 over three decades (A = 1000). 

However, with the smaller estimate a = 1.25 provided by Eeckhout (2004) for the U.S. Census 
2000 data, the apparent exponent varies by 1.5 units over just two decades. This is an indication that a 
powerful test, as implemented in the next section, should be able to distinguish the two hypotheses over 
a range of two to three decades corresponding to the tail regime suggested by Levy (2009). 



1.2 Generating process 

Gibrat's law of proportional growth is often taken as a key starting point to understand the origin of 
the distribution of city sizes (see the recent review by Saichev et al. (2009) and references therein). 
Eeckhout (2009) also stressed that Gibrat's law remains the corner stone for building economic models 
of population dynamics. Considered as the unique ingredient, Gibrat's law predicts that the distribution 
of city sizes should tend to a lognormal distribution, but as a more and more degenerate one as time 
increases. Indeed, Gibrat's law leads to model the growth of a given city as following a random walk in 
its log-size, which therefore never admits a steady state distribution. 

The equation of city growth embodying Gibrats law is 

Si t t = (H,t • Si,t-i , (6) 
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where Si t lS the the size of city i at time t and a^t is the random positive growth factor. Taking the 
logarithm of © and iterating yields 

In Si )t = In Si )t -i + %t = In #i,0 + + 7/ i)2 + ... + ??i,t , (7) 

where 77^ = Inof t. Assuming (for a time) that terms 774 1 are iid random variables with expectation A 
and standard deviation B, the Central Limit Theorem of Probability Theory gives 

InSij-t-A + t^B-t, (8) 

where £ is a standard Gaussian random variable iV(0, 1). Of course, the stationarity of the rji/s should 
be verified by an appropriate analysis. Assuming in addition that the stochastic growth process for a 
typical city as a function of time is equivalent to sampling the growth of many cities at a given instant, 
i.e., that a strong form of ergodicity holds, expression © ensures that the the distribution of city sizes is 
lognormal, i.e., the variable ln ^ f 2 J A is N(0, 1). 

As recalled for instance by Gabaix (1999), an apparently minor modification leads to a bona fide 
steady state and, therefore, to a stationary distribution of city sizes. This modification, which can take 
many forms (Sornette 1998), consists in preventing the small cities from becoming too small. The 
corresponding generic equation of motion for city sizes embodying this idea together with Gibrat's law 
is (Gabaix 1999) 

Si,t = (H,t • Si,t-i + e i,t , (9) 

where the terms > prevent the accumulation of a large number of cities with vanishingly small 
sizes. In absence of e» t, expression (O is nothing but the random walk in log-size leading to the log- 
normal distribution obtained from (HJ. Because the process (O with non-zero Ei± leads to a stationary 
distribution^ if we assume ergodicity, then the distribution of an ensemble of cities is the same as that of 
the set of realizations {S^t} for a fixed city i as a function of t for large times. 

The presence of the "minor modification" e^ t > ensures that the size distribution of cities switches 
from a lognormal to a Pareto, even if it is arbitrarily small, as long as it is non-zero (Kesten 1973). The 
tail index a of the Pareto distribution is the solution to E [(ai,t) a ] = 1. Gabaix (1999) argued for the 
validity of the constraint E [a^] = 1, which then leads to Zipf 's law: a = 1. Saichev et al. (2009) shows 
that Zipf's law is more realistically the result of Gibrat's law together with a condition balancing the 
birth rate, random growth and possible death rate of citie^]. 

The intuition behind the transformation of the lognormal into the Pareto distribution, upon the in- 
troduction of the apparently minor additive term > is the following. Because of the stationarity 
condition E [lna^J < 0, in the absence of £j )t , the process S^t tends to shrink stochastically towards 
zero, while exhibiting a more and more degenerate lognormal distribution. During this phase, a few ex- 
cursions of exponentially large sizes associated with transient occurrences of the growth factor a^t larger 
than 1 can occur with exponentially small probability. The term en allows the process to repeatedly ex- 
hibit the exponentially rare exponentially large excursions. The combination of these two exponentials 
leads to the Pareto distributioro 

Eeckhout (2004)'s model provides an expression for the growth of cities of the form ©, with 
a i,t = 1/A _1 (1 + (ij i) as defined on page 1447. The function A(5j i t) ~ Sf t denotes the net local 
size effect on the growth of cities, er^ corresponds to exogenous technology shock impacting city i at 

'The condition for stationarity is E [In at.t] < 0. 

2 In the case of cities, death means falling below a moving threshold for qualifying as a city. 

3 For the more realistic situation where cities are on average growing, by an exponentially growing term £i y t so as to represent 
immigration or population fluxes across cities for instance, the same reasoning applied once a change of frame has been 
performed with respect to the exponentially growing 6i,t term (see Somette (1998) for details). 



4 



time t and © = — (9 — 7 — P/a) in the notations of Eeckhout (2004). The exponents a and p quantify 
the consumer preference with respect to consumption, amount of land and housing, and leisure. The ex- 
ponent 9 describes the dependence of the positive externality of being in a city of size S. The exponent 7 
describes the dependence of the negative external effect of how leisure can be used for labor. Then, any 
mechanism, ensuring a minimum (even random) city size helping to transform © into (© or equivalent 
(Sornette 1998), leads to the Pareto distribution for the tail of the distribution of city sizes with tail ex- 
ponent a = — 0. Since < 0, the "net local size effect" A(Si ; t) is an inverse power of the city size so 
that a faster decay of the tail of the distribution of city sizes corresponds to a weaker relative impact of 
net local externalities A(Si,t) on large cities compared to smaller cities. Zipf's law is recovered for the 
special case = — 1. 



2 Testing the Pareto against the lognormal distribution 
2.1 The uniformly most powerful unbiased test 

As summarized in the introduction, Eeckhout (2004) and Levy (2009) have used general tests (L-test and 
X 2 -test respectively) of the null hypothesis that the whole sample or just the upper tail is generated by 
a lognormal distribution, and they reach opposite conclusions. While these two tests are quite versatile, 
they are not always very powerful. For the purpose of comparing the lognormal distribution with Zipf's 
law, their lack of power can be ascribed to the fact that they test the null hypothesis against any alternative 
distribution, and not specifically against the Pareto distribution. But the later is the alternative of interest. 
For instance, figure 2 in (Eeckhout 2009) illustrates the dramatic lack of power of the L-test in the upper 
tail of the distribution under the null of a lognormal: the confidence bands derived from this test fan 
out very strongly, which makes this test completely unable to decide if the deviations observed in the 
data are genuine or fake. Of course, the main reason for the decreasing power observed in figure 2 in 
(Eeckhout 2009) is the shrinking sample size for the upper ranks, but this does not remove the necessity 
of using the most possible powerful test in such a situation. 

The discussion following equations I® and ((5]) suggests that it might be possible to clearly distinguish 
between the explanatory power offered by a lognormal distribution versus a Pareto distribution for the 
U.S. Census 2000 data sample, when using a more powerful test. The most general test that addresses the 
core question, whether the Pareto law holds in the tail or the lognormal model is sufficient, is to consider 
the two hypotheses: Pareto distribution for values of x larger than some threshold u and lognormal 
distribution also for value of x above the same threshold u. Specifically, we propose to test the null 
hypothesis that, beyond some threshold u, the upper tail of the size distribution of cities is Pareto 

H : f (x; a) = a ■ ■ l x > u , a > 0, (10) 

against the alternative that it is a (truncated) lognormal 

l e - a l Ilg - / 9l*»m. 1 a eR,P>0, 
x 

(11) 

where $(•) denotes the CDF of the normal distribution. 

This is equivalent to testing the null hypothesis that the upper tail of the log-size distribution of cities 
is exponential against the alternative that it is a (truncated) normal. For this later problem, Del Castillo 
and Puig (1999) have shown that the clipped sample coefficient of variation c = min(l, c) provides the 
uniformly most powerful unbiased test, where c is the sample coefficient of variation defined as the ratio 
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of the sample standard deviation to the sample mean. The critical point of the test can be derived with 
extremely high accuracy (even for very small samples) by a saddle point approximation (Del Castillo 
and Puig 1999, Gattoa and Jammalamadakab 2002) or by Monte Carlo methods. 

2.2 Results 

The upper panel of figure[T]depicts the p-value of the test as a function of the lower threshold u expressed 
in terms of the rank of city sizes represented in a logarithmic scale. The p-values have been calculated 
using the saddle point approximation (Del Castillo and Puig 1999, Gattoa and Jammalamadakab 2002). 
Extensive Monte-Carlo simulations reproduce basically the same results. Figure [Qindubitably shows that 
the size distribution of the 1000 largest cities or so, which include more than half of the total population, 
is Pareto. This confirms and makes more precise the claim of Levy (2009). For larger ranks, the p-value 
becomes very small, qualifying the lognormal distribution as the better model for the set of smaller cities. 
This explains Eeckhout (2004)'s results. 

The lower panel of figure [Qdepicts Hill's estimate a -1 of the inverse of the tail index a of the Pareto 
distribution (flOl again as a function of city rank. This estimator is the best unbiased estimator for the 
inverse of the tail index0(Hill 1975). For the U.S. census 2000 data (blue upper noisy curve), the inverse 
of the tail index is approximately constant and fluctuates around the value 0.7 for ranks less than one 
thousand or so, confirming the validity of the Pareto model over this range. For ranks larger than one 
thousand, the Hill's estimate a -1 deviates rapidly, confirming a deviation from the Pareto model for the 
set of smaller cities. 

In the lower panel of figure \T\ we also show Hill's estimate a -1 for ten random samples drawn 
from a lognormal distribution with parameters [i = 7.28 and a = 1.25 (red curves). One can observe 
the absence of a plateau, and therefore no well-defined exponent, thus disqualifying the Pareto model. 
The increase of a -1 with rank is the expected signature of the fact that the lognormal density is rapidly 
decreasing, i.e., it goes to zero faster than any power law, so that its effective tail index is equal to 
infinity and its inverse is vanishing. Therefore, for very low ranks (largest cities), Hill's estimator should 
converge to zero for data generated by a lognormal distribution. 

The contrast between the U.S. Census 2000 data and the samples drawn from a lognormal distribution 
with parameters \i = 7.28 and a = 1.25 is striking and provides additional evidence in favor of the Pareto 
distribution for the upper tail. This makes clear that the Pareto and lognormal models are distinguishable 
in their tail for the available U.S. Census 2000 data sample. 

2.3 Pareto model versus Zipf 's law 

Now that we have established that the tail of the size distribution of cities is Pareto, we turn to the 
question of whether this Pareto law is Zipf 's law, i.e., whether the exponent is a = 1. 

First, the lower panel of figure Q] shows the confidence band at the 95%- and 99% significance levels, 
derived from the uniformly most powerful unbiased test that the tail index a = 1 against a two-sided 
alternative (Lehman and Romano 2006). At the 95% significance level, Zipf's law is rejected, except 
for the twenty largest cities. Figure [2] improves on this statistics by plotting the p- value defined as the 
probability of exceeding the observed index estimate (one side-test) under the hypothesis that Zipf's law 
holds (index equals to unity). For rank thresholds larger than 20, all p-values are smaller than 0.05. For 
rank thresholds larger than 16, all p-values are smaller than 0.10. We are thus led to conclude that Zipf's 

4 It is not possible to get an unbiased estimate for a. 
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law cannot be accepted to describe the tail of the distribution of city sizes in the US census studied here, 
whereas a larger exponent approximately equal to 1.4 is significantly more likely. 

Coming back to Eeckhout (2004)'s model, our finding a = —0 » 1.4 implies that the "net local 
size effect" A(S , j t ) decreases faster with city size S 1 ^ than would be the case if Zipf 's law held exactly. 
We also refer to Saichev et al. (2009) for a review of the mechanisms based on Gibrat's law leading to 
distributions with Pareto tails whose exponents can deviate from the Zipf 's law value a = 1. 
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Figure 1 : The upper panel depicts the p-value of the test of the null hypothesis that the upper tail of the 
size distribution of cities is Pareto against the alternative that it is a (truncated) lognormal as a function of 
the rank threshold, where cities are ordered by decreasing sizes. The lower panel depicts Hill's estimate 
of the inverse of the tail index for the Census 2000 data (blue upper curve) and for ten samples drawn from 
a lognormal distribution with parameters \i = 7.28 and a = 1.25 (red bottom curves). The two dashed 
(respectively dot-dash) curves provides the confidence bands at the 5%-significance level (respectively 
1 % level) derived from the UMPU test that the tail index a = 1 against a two-sided alternative. 
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o p-values for Zipf hypothesis (a=1) 
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Figure 2: One-sided p-value as a function of rank threshold, testing the hypothesis that the tail exponent 
of the Pareto distribution is compatible with Zipf 's laws that a = 1. The p-value is defined as the 
probability of exceeding the observed index estimate under the hypothesis that Zipf 's law holds. 
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